Now that we have processed our spatialExperiment object
and explored the clusters on the tissues (script
002_Cluster_Visualization.R), we can now start working on the
single-cell analysis portion. Most of the sections of this code can be
expanded upon in Section 10 of the Bodenmiller tutorial here: https://bodenmillergroup.github.io/IMCDataAnalysis/single-cell-visualization.html
Again, a copy of the actual code can be found here: https://github.com/micmartinez99/Early_Onset_IMC.
Let’s get started
# Load libraries
library(imcRtools)
## Loading required package: SpatialExperiment
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(cytomapper)
## Loading required package: EBImage
##
## Attaching package: 'EBImage'
## The following object is masked from 'package:SummarizedExperiment':
##
## resize
## The following object is masked from 'package:Biobase':
##
## channel
## The following objects are masked from 'package:GenomicRanges':
##
## resize, tile
## The following objects are masked from 'package:IRanges':
##
## resize, tile
##
## Attaching package: 'cytomapper'
## The following objects are masked from 'package:Biobase':
##
## channelNames, channelNames<-
library(RColorBrewer)
library(scater)
## Loading required package: scuttle
## Loading required package: ggplot2
library(ggplot2)
library(ggpubr)
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:EBImage':
##
## rotate
library(viridis)
## Loading required package: viridisLite
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
##
## get_legend
library(patchwork)
##
## Attaching package: 'patchwork'
## The following object is masked from 'package:cowplot':
##
## align_plots
library(corrplot)
## corrplot 0.92 loaded
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:EBImage':
##
## translate
## The following object is masked from 'package:Biobase':
##
## contents
## The following objects are masked from 'package:base':
##
## format.pval, units
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::%within%() masks IRanges::%within%()
## ✖ dplyr::collapse() masks IRanges::collapse()
## ✖ dplyr::combine() masks EBImage::combine(), Biobase::combine(), BiocGenerics::combine()
## ✖ dplyr::count() masks matrixStats::count()
## ✖ dplyr::desc() masks IRanges::desc()
## ✖ tidyr::expand() masks S4Vectors::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks S4Vectors::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## ✖ purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
## ✖ dplyr::rename() masks S4Vectors::rename()
## ✖ lubridate::second() masks S4Vectors::second()
## ✖ lubridate::second<-() masks S4Vectors::second<-()
## ✖ dplyr::slice() masks IRanges::slice()
## ✖ dplyr::src() masks Hmisc::src()
## ✖ lubridate::stamp() masks cowplot::stamp()
## ✖ dplyr::summarize() masks Hmisc::summarize()
## ✖ purrr::transpose() masks EBImage::transpose()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dittoSeq)
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.18.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
##
## If you use it in published research, please cite either one:
## - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
## - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
##
##
## The new InteractiveComplexHeatmap package can directly export static
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
library(magick)
## Linking to ImageMagick 6.9.12.93
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
##
## The following object is masked from 'package:viridis':
##
## viridis_pal
library(CATALYST)
In script 001_Prepare_Spe_Object.R, we visualized the
UMAP projections for sample_id and source. Here, just so we have a
complete set of UMAPs, we will visualize all relevant metadata
categories on UMAP. We have information for tumor stage now (after
meeting with Dr. Chris Flynn, so we now have to append this to the
spe object and eventually to our img and
mask cytoImageList objects.)
Here, in addition to the metadata categories being colored on the UMAP projections, we can also visualize the marker expression levels on the UMAP.
# Read in the processed spe object
spe <- readRDS("../../Data/001_Preprocessed/PG_Clustered_Spe.Rds")
# Read in the metadata
metadata <- read.csv("../../Data/Updated_Metadata.csv")
samples <- unique(spe$sample_id)
metadata <- metadata[metadata$Sample_ID %in% samples,]
# Add a column for Stage
spe$Stage <- metadata$Stage[match(spe$sample_id, metadata$Sample_ID)]
# UMAP for sample_id
dittoDimPlot(spe, var = "sample_id",
reduction.use = "UMAP_Harmony", size = 0.2) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 24, face = "bold"),
legend.position = "bottom") +
labs(title = "")
# UMAP for Indication
dittoDimPlot(spe, var = "Indication",
reduction.use = "UMAP_Harmony", size = 0.2,
color.panel = metadata(spe)$color_vectors$Indication) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 24, face = "bold"),
legend.position = "bottom") +
labs(title = "")
# UMAP for Source
dittoDimPlot(spe, var = "Source",
reduction.use = "UMAP_Harmony", size = 0.2,
color.panel = metadata(spe)$color_vectors$Source) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 24, face = "bold"),
legend.position = "bottom") +
labs(title = "")
# UMAP for Stage
dittoDimPlot(spe, var = "Stage",
reduction.use = "UMAP_Harmony", size = 0.2) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 24, face = "bold"),
legend.position = "bottom") +
labs(title = "")
# UMAP for RPhenograph clusters
dittoDimPlot(spe, var = "pg",
reduction.use = "UMAP_Harmony", size = 0.2, do.label = TRUE) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 30, face = "bold")) +
labs(title = "PG Clusters")
# Clean up the rownames
rownames(spe)[rowData(spe)$use_channel] <- c("aSMA", "Vimentin", "CD163", "Pan CK", "CD15",
"CD45", "FoxP3", "CD4", "E-Cadherin", "CD68", "CD8a",
"GZMB", "Ki67", "Collagen I", "CD3", "CD45RO")
# Visualize the marker expression on UMAP
markers <- rownames(spe)[rowData(spe)$use_channel]
plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP_Harmony",
assay = "exprs",
size = 0.2,
list.out = TRUE,
axes.labels.show = FALSE,
show.axes.numbers = FALSE,
show.grid.lines = FALSE,
theme_classic())
# Plot each individual markers expression
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis(option = "B") +
theme(axis.title = element_text(face = "bold"),
axis.ticks = element_blank(),
strip.text = element_text(face = "bold"),
title = element_text(face = "bold")))
markerPlots <- plot_grid(plotlist = plot_list)
markerPlots
Now, we have visualized the marker expressions on the UMAPs along
with the clusters. We also have the cluster heatmap and the images
generated in script 002_Cluster_Visualization.R. Based on
all of this information, we can now assign labels to the clusters.
Additionally, we will add a color vector to identify the clusters in
downstream figures.
# Assign celltypes to each cluster based on UMAPs
spe$cellTypes <- ifelse(spe$pg == "1", "Tregs",
ifelse(spe$pg == "2", "Cytotoxic Cells",
ifelse(spe$pg == "3", "Cytotoxic Cells",
ifelse(spe$pg == "4", "Epithelium",
ifelse(spe$pg == "5", "Stroma",
ifelse(spe$pg == "6", "Undefined",
ifelse(spe$pg == "7", "Stroma",
ifelse(spe$pg == "8", "Epithelium",
ifelse(spe$pg == "9", "Undefined",
ifelse(spe$pg == "10", "Stroma",
ifelse(spe$pg == "11", "Proliferating Epithelium",
ifelse(spe$pg == "12", "M2 Macrophage",
ifelse(spe$pg == "13", "T-Helpers",
ifelse(spe$pg == "14", "Epithelium",
ifelse(spe$pg == "15", "CTLs",
ifelse(spe$pg == "16", "Proliferating Epithelium",
ifelse(spe$pg == "17", "Undefined",
ifelse(spe$pg == "18", "Epithelium", "IELs"))))))))))))))))))
# Add a color vector to the metadata slot for celltypes
cellTypeColors <- setNames(c("#3F1B03", "#894F20","#F4AD31", "#1C750C", "#EF8EC1",
"#6471E1", "#F4800C", "cyan3", "darkgrey", "#BF0A3D"),
c("Stroma", "Epithelium", "Proliferating Epithelium", "M2 Macrophage", "Cytotoxic Cells",
"CTLs", "T-Helpers", "IELs", "Undefined", "Tregs"))
metadata(spe)$color_vectors$cellTypes <- cellTypeColors
# Factor the cellTypes
spe$cellTypes <- factor(spe$cellTypes, levels = c("Stroma", "Epithelium",
"Proliferating Epithelium", "M2 Macrophage",
"Cytotoxic Cells", "CTLs", "T-Helpers",
"IELs", "Tregs", "Undefined"))
# Visualize the phenotypes clusters on UMAP
dittoDimPlot(spe, var = "cellTypes",
reduction.use = "UMAP_Harmony",
size = 0.2,
do.label = TRUE,
labels.size = 4,
labels.highlight = TRUE,
color.panel = cellTypeColors) +
theme(axis.title.x = element_text(size = 24, face = "bold"),
axis.title.y = element_text(size = 24, face = "bold"),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
title = element_text(size = 24, face = "bold")) +
labs(title = "")
We can now show the clusters on the tissues as an example
# Read in the images and masks
masks <- readRDS("../../Data/Images/Final_Mask_Dataset.Rds")
# Create a new colData entry in spe that will be a clean ID for plotting purposes
spe$ID <- paste(gsub("_", " ", spe$sample_id), spe$Indication, sep = " ")
mcols(masks)$ID <- paste(gsub("_", " ", mcols(masks)$sample_id), mcols(masks)$Indication, sep = " ")
# Pick an ROI to visualize (you can change this to any ROI in the dataset)
vis <- "ROI_002"
cur_masks <- masks[names(masks) %in% vis]
# Plot cell clusters on this ROI to visualize clustering
plotCells(cur_masks,
object = spe,
cell_id = "ObjectNumber",
img_id = "ID",
colour_by = "cellTypes",
colour = list(cellTypes = metadata(spe)$color_vectors$cellTypes),
image_title = list(text = mcols(cur_masks)$ID,
position = "top",
colour = "white",
margin = c(5,5),
font = 2,
cex = 1.3),
legend = list(colour_by.title.cex = 1,
colour_by.labels.cex = 1,
colour_by.title.font = 3,
margin = 50))
Similarly to how we generated a heatmap for the RPhenograph clusters,
we can now generate a heatmap for the phenotyped clusters. Here we will
make the graphic a bit fancier than the dittoHeatmap
generated in script 001_Prepare_Spe_Object.R by using R
package complexHeatmap. NOTE: There is a bug in
complexHeatmap that will cause R to crash when you draw a
heatmap. To fix this bug, you must load R package magick
alongside complexHeatmap.
# Aggregate clusters across cells by taking the mean cluster counts per cell
image_mean <- aggregateAcrossCells(as(spe[rowData(spe)$use_channel], "SingleCellExperiment"),
ids = spe$pg,
statistics = "mean",
use.assay.type = "counts")
# Transform counts to expression values by arcsin transforming
assay(image_mean, "exprs") <- asinh(counts(image_mean)/1)
# Save the image mean data as a dataframe for manual plotting
imdata <- as.data.frame(t(assay(image_mean, "exprs")))
# Create a metadata dataframe for the PG clusters
clusterMeta <- data.frame(Cluster_Num = c(1:19),
Phenotype = c("Tregs", "Cytotoxic Cells",
"Cytotoxic Cells", "Epithelium", "Stroma",
"Undefined", "Stroma", "Epithelium", "Undefined",
"Stroma", "Proliferating Epithelium", "M2 Macrophage",
"T-Helpers", "Epithelium", "CTLs",
"Proliferating Epithelium", "Undefined", "Epithelium", "IELs"))
clusterMeta$Cluster_Num <- paste("Cluster", rownames(clusterMeta), sep = " ")
# Factor the celltypes
clusterMeta$Phenotype <- factor(clusterMeta$Phenotype, levels =
c("Stroma", "Epithelium",
"Proliferating Epithelium", "M2 Macrophage",
"Cytotoxic Cells", "CTLs", "T-Helpers",
"IELs", "Tregs", "Undefined"))
# Function to max scale the image mean data
max_scale <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
# Apply the function to the dataframe
maxScaled <- as.data.frame(lapply(imdata, max_scale))
# Clean up the rownames for the max-scaled data and for the metadata
rownames(maxScaled) <- paste("Cluster", rownames(maxScaled), sep = " ")
rownames(clusterMeta) <- as.character(rownames(maxScaled))
clusterMeta$Cluster_Num <- NULL
# Ensure the colors are in the order of the factored levels
phenotype_levels <- levels(clusterMeta$Phenotype)
phenotype_colors <- setNames(c("#3F1B03", "#F4AD31", "#894F20", "#1C750C", "#EF8EC1", "#6471E1", "#F4800C", "cyan3", "#BF0A3D", "grey"),
phenotype_levels)
# Create the row annotation
CTAnno <- rowAnnotation(
`Cell Type` = clusterMeta$Phenotype,
col = list(`Cell Type` = phenotype_colors),
show_annotation_name = FALSE
)
# Order the columns
colOrder <- c("E.Cadherin", "Pan.CK", "Ki67",
"aSMA", "Vimentin", "Collagen.I",
"CD45", "CD45RO", "CD3", "CD8a", "CD4", "FoxP3",
"CD68", "CD163", "CD15", "GZMB")
maxScaled <- maxScaled[,colOrder]
# Clean up column names
colnames(maxScaled) <- c("E-cadherin", "Pan-CK", "Ki67", "aSMA", "Vimentin", "Collagen I", "CD45",
"CD45RO", "CD3", "CD8a", "CD4", "FoxP3", "CD68", "CD163", "CD15", "GZMB")
# Set splitting pattern
hmSplit <- rep(1:5, c(3,3,6,2,2))
# Add metadata features
anno <- colData(image_mean) %>%
as.data.frame %>%
select(cellTypes, ncells)
# Add number of cells per cluster as a row annotation
ha_meta <- rowAnnotation(`Number Cells` = anno_barplot(anno$ncells, width = unit(30, "mm")))
# Plot the heatmap
Heatmap(as.matrix(maxScaled),
show_column_names = TRUE,
name = "Max Scale",
cluster_rows = TRUE,
cluster_columns = TRUE,
left_annotation = CTAnno,
column_title = NULL,
column_split = hmSplit,
gap = unit(1, "mm"),
row_title = NULL,
col = rev(colorRampPalette(brewer.pal(11, "RdYlBu"))(100)),
row_dend_width = unit(2, "cm"),
column_dend_height = unit(2, "cm"),
row_dend_gp = gpar(lwd = 2.5),
column_dend_gp = gpar(lwd = 2.5),
right_annotation = ha_meta)
Next, we can visualize a stacked barplot showing the cell type
composition for each individual sample split by Indication. Doing this
is a visual way of differential cell type abundance. We will run actual
statisitcs on these differences in a later code chunk, but the
visualization will comprise one panel of a figure. We want one that is
the absolute cell number per sample and then one that is scaled to be a
perentage.
# Plot a compositional barplot
dittoBarPlot(spe,
var = "cellTypes",
group.by = "sample_id",
split.by = "Indication",
split.adjust = list(scales = "free")) +
scale_fill_manual(values = metadata(spe)$color_vectors$cellTypes) +
scale_y_continuous(labels = percent_format()) +
labs(y = "Percent of Cells",
x = "",
fill = "",
title = "") +
theme(axis.title.x = element_blank(),
axis.title.y = element_text(face = "bold", size = 24),
strip.text = element_text(face = "bold", size = 24),
legend.position = "none")
# Plot the number of cells per ROI
dittoBarPlot(spe,
scale = "count",
var = "cellTypes",
group.by = "sample_id",
split.by = "Indication",
split.adjust = list(scales = "free")) +
scale_fill_manual(values = metadata(spe)$color_vectors$cellTypes) +
labs(y = "Number of Cells",
x = "",
fill = "",
title = "") +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y = element_text(face = "bold", size = 24),
strip.text = element_text(face = "bold", size = 24),
axis.ticks.x = element_blank(),
legend.position = "none")
We can now assess the significance of the clusters between early and
late. For the code to achieve this, please see the code file
002_Single_Cell_Analysis.R. Make sure to save an RDS file
of the spatialExperiment object for downstream code.